Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nozomi UDI connector #1656

Merged

Conversation

thangaraj-ramesh
Copy link
Contributor

Nozomi UDI connector

Nozomi UDI connector
Copy link

codecov bot commented Feb 27, 2024

Codecov Report

Attention: Patch coverage is 93.96552% with 70 lines in your changes are missing coverage. Please review.

Project coverage is 86.44%. Comparing base (184636f) to head (13db2fe).
Report is 17 commits behind head on develop.

❗ Current head 13db2fe differs from pull request most recent head 7548f7c. Consider uploading reports for the commit 7548f7c to get more accurate results

Files Patch % Lines
...dules/nozomi/stix_translation/query_constructor.py 91.17% 27 Missing ⚠️
...er_modules/nozomi/stix_translation/transformers.py 73.80% 22 Missing ⚠️
...fter_modules/nozomi/stix_transmission/connector.py 92.46% 11 Missing ⚠️
...dules/nozomi/test/stix_transmission/test_nozomi.py 98.19% 4 Missing ⚠️
...r_modules/nozomi/stix_transmission/error_mapper.py 85.00% 3 Missing ⚠️
...test/stix_translation/test_nozomi_stix_to_query.py 98.71% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1656      +/-   ##
===========================================
- Coverage    86.52%   86.44%   -0.08%     
===========================================
  Files          600      596       -4     
  Lines        51314    51200     -114     
===========================================
- Hits         44397    44260     -137     
- Misses        6917     6940      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

thangaraj-ramesh and others added 3 commits February 27, 2024 14:07
},
{
"key": "ipv6-addr.value",
"object": "src_ip"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this mapping will create both ipv4-addr and ipv6-addr. you need a transformer to check the ip version before creating the object

Copy link
Contributor Author

@thangaraj-ramesh thangaraj-ramesh Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stix-shifter framework does check internally for the ipv4-addr & ipv6-addr pattern matches. Hence transformer is not required.

},
{
"key": "ipv6-addr.value",
"object": "dst_ip"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same problem as ip_src

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stix-shifter framework does check internally for the ipv4-addr & ipv6-addr pattern matches. Hence transformer is not required.

{
"key": "network-traffic.src_ref",
"object": "nt",
"references": "mac_addr_src"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it replaces the src_ref that is created by ip_src. you may need to create separate network traffic object if there are two sources such as ip and mac

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ip and mac addresses are from same source. We have observed that sometimes in alert only one of these attributes is filled with the value otherwise it is null. Hence we have added the references in both cases.



class ChainNameValue(ValueTransformer):
"""A value transformer to add default 'kill_chain_name' as 'mitre-attack' with obj"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add input/output values in the comment for all the transformers? its easier for maintenance in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment for input/output values.

Added input output samples for all the transformers
if data:
data = Connector.get_results_data(data)
if metadata:
return_obj['data'] = data if data else []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Page_size is static, so this can overflow above total_records length.

You can just do
data[0: total_records]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From CLI we can give custom offset values. If user gives offest=10 and length =1000, then we need to fetch 10th to 1010th record. We can't use offset always 0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting offsets with metadata would be quite odd and I can't really think of a use case where this would be necessary.

As a note, this comment was originally made with the idea of a page_size that does not change. Hence the need to account for a data list that could be greater than the requested amount. For example, If your page_size is 99 and the length is 100, then you could get a resulting data of size 198 (98 more than requested). You'd need to trim it down to 100 (or the total_records, which should usually be 0+length when using metadata).

Copy link
Contributor Author

@thangaraj-ramesh thangaraj-ramesh Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the batch iterations offset and length is used to get the requested records. Please check the latest code.

page_number = int(metadata.get('page_number', 1))
# overwrite the offset for current batch.
offset = abs(offset - ((page_number - 1) * self.api_client.api_page_size))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on this? I'm not really sure what this does.

If my offset is 0, this becomes larger over iterations. If my offset is 4,000, this becomes smaller over time and eventually starts to increase.

Copy link
Contributor Author

@thangaraj-ramesh thangaraj-ramesh Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset passed from the function parameter is not directly used for slicing.
after calculating offset for the current batch, It will be used for slicing.

if the offset is 0 page_number will start from 1.

if the offset is 4000 page_number will come as 5 from the metadata.
It will not start form 1 because already 4 pages are processed.

example 1: offset starts from 0.
for the first batch
offset=0 length=2000 pagesize=1000 page_number=1
offset = abs(0 - ((1 - 1) * 1000))
offset for the current batch is 0.

for the second batch
offset=2000 length=2000 pagesize=1000 page_number=3
offset = abs(2000 - ((3 - 1) * 1000))
offset for the current batch is 0.

for the third batch
offset=4000 length=2000 pagesize=1000 page_number=5
offset = abs(4000 - ((5 - 1) * 1000))
offset for the current batch is 0.

example 2: offset does not start from 0.
for the first batch
offset=500 length=2000 pagesize=1000 page_number=1
offset = abs(500 - ((1 - 1) * 1000))
offset for the current batch is 500.

for the second batch
offset=2500 length=2000 pagesize=1000 page_number=3
offset = abs(2500 - ((3 - 1) * 1000))
offset for the current batch is 500.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some context on why this needs to be done? What happens if my offset is less than the page_number * pagesize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the pagination as per the suggestion.

offset = int(offset)
length = int(length)
result_count = offset
total_records, page_number, offset = self.handle_metadata(offset, length, metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clean up some of these variable names? We have length, result_count, expected_result_count, total_records and local_result_count. It's difficult to follow what these are actually used for.

You could add it as a comment if you'd like to keep the names.

Copy link
Contributor Author

@thangaraj-ramesh thangaraj-ramesh Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comments are added.


return_obj = self.handle_data(data, metadata, offset, total_records, return_obj)
return_obj = self.handle_data(data, offset, expected_result_count, return_obj)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you handling partial pages.

For example,
Page size = 100
length = 101

What happens when it tries to get that remaining result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example,
consider offset=0, length=101, pagesize=100

It will make two API calls.
From the first page, will fetch 100 records.
From the second page, will fetch 100 records.
from the 200 records first 101 records will be sliced.
Totally, it will return 101 records.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with the meta data in this case. What will the next query be when it goes to get the next 101 records.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the pagination as per the suggestion.

try:
offset = int(offset)
length = int(length)
# result_count is overall record count from all the previous batches processed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems off. Why is offset being used to hold the total record count? We discussed before that there is a way to do this without worrying about the total number of results we've received.

I understand that what you are trying to handle is the remaining data from a partially handled page. IE
length = 80 page_size = 100.
The results will have a remainder of 20.

The easiest way to do this is to simply persist how much of the page is processed as META_DATA. For example, in this above scenario, you return back
{"page":1, "page_index":80}.

Then on the next call you return back
{"page":2, "page_index":60}

Then
{"page":3, "page_index":40}

You are keeping track of both the current page, and where you were at in that page with the metadata. The first query should handle the offset and subsequent ones should just use the page + page_index.

I don't question that this implementation doesn't work, but I am concerned that it's relying a bit too much on the caller behaving a certain way. Relying on a simple page + page_index implementation is straight-forward and easier to understand. After the first call, we just ignore offset and return length number of records.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the pagination as per the suggestion.

@DerekRushton DerekRushton merged commit 31a9b9c into opencybersecurityalliance:develop May 6, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants